
需求把flume收集的日志发送到streaming处理
    解决需求的思路
        1.看官网
        2.根据官网配置flume
        3.引入依赖
        4.使用FlumeUtil接收flume数据，创建DStream


Push方式整合：Flume Agent push 信息到 application
        Flume Agent的编写： flume_push_streaming.conf

        simple-agent.sources = netcat-source
        simple-agent.sinks = avro-sink
        simple-agent.channels = memory-channel

        simple-agent.sources.netcat-source.type = netcat
        simple-agent.sources.netcat-source.bind = hadoop000
        simple-agent.sources.netcat-source.port = 44444

        simple-agent.sinks.avro-sink.type = avro
        #hostname 测试时候是启动spark程序的电脑的ip
        simple-agent.sinks.avro-sink.hostname = 192.168.0.139
        simple-agent.sinks.avro-sink.port = 41414

        simple-agent.channels.memory-channel.type = memory

        simple-agent.sources.netcat-source.channels = memory-channel
        simple-agent.sinks.avro-sink.channel = memory-channel




        local的模式进行Spark Streaming代码的测试  192.168.0.139

本地测试总结
        1）启动sparkstreaming作业
        2) 启动flume agent
             flume-ng agent  \
                    --name simple-agent   \
                    --conf $FLUME_HOME/conf    \
                    --conf-file $FLUME_HOME/conf/flume_push_streaming.conf  \
                    -Dflume.root.logger=INFO,console
        3) 通过telnet输入数据，观察IDEA控制台的输出
            telnet hadoop000 44444


    spark-submit \
    --class com.xiaoxu.FlumePushWordCount \
    --master local[2] \
    --packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \
    /home/hadoop/lib/*.jar \
    hadoop000 41414



======


Pull方式整合

        Flume Agent的编写： flume_pull_streaming.conf

        simple-agent.sources = netcat-source
        simple-agent.sinks = spark-sink
        simple-agent.channels = memory-channel

        simple-agent.sources.netcat-source.type = netcat
        simple-agent.sources.netcat-source.bind = hadoop000
        simple-agent.sources.netcat-source.port = 44444

        simple-agent.sinks.spark-sink.type = org.apache.spark.streaming.flume.sink.SparkSink
        simple-agent.sinks.spark-sink.hostname = hadoop000
        simple-agent.sinks.spark-sink.port = 41414

        simple-agent.channels.memory-channel.type = memory

        simple-agent.sources.netcat-source.channels = memory-channel
        simple-agent.sinks.spark-sink.channel = memory-channel

        注意点：先启动flume 后启动Spark Streaming应用程序




本地测试总结

        1) 启动flume agent
            flume-ng agent  \
                    --name simple-agent   \
                    --conf $FLUME_HOME/conf    \
                    --conf-file $FLUME_HOME/conf/flume_pull_streaming.conf  \
                    -Dflume.root.logger=INFO,console
        2）启动sparkstreaming作业
        3) 通过telnet输入数据，观察IDEA控制台的输出
            telnet hadoop000 44444



spark-submit \
--class com.imooc.spark.FlumePullWordCount \
--master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \
/home/hadoop/lib/sparktrain-1.0.jar \
hadoop000 41414







